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Variance  Reduction  in  Monte  Carlo  Simulation 


by 

Mark  Brown 
Herbert  Solomon 
and 

Michael  A.  Stephens 


1.  Introduction. 


Monte  Carlo  simulation  is  employed  in  a large  variety  of  problems.  Fre 

quently,  one  is  interested  in  the  expectation  of  a function  g(X^, . . . ,X^) 

where  < i > 1 > is  i.i.d.  with  known  distribution  F and  N is  a 

stopping  time  (often  a constant).  The  procedure  followed  is  to  generate 

/ • \ / • \ 

a large  number  of  samples  (X^1  X^  ),  i = 1,2, . ..,M,  and  estimate 

l 

the  expectation  of  interest  by 

H 

1 r a (xi1  ^ x.(i)) 

M g(Xl  ; * 

1=1  l 


An  interesting  aspect  of  the  simulation  estimation  problem  is  that 
F is  known.  Thus  functions  of  the  fomn  £ (F,X^, . . . ,X^)  can  be  employed 
an  estimators,  while  in  statistical  estimation  problem  with  F unknown 
£ cannot  be  computed  from  the  data  and  is  thus  not  considered  to  be  an 
estimator.  Thus  the  class  of  estimators  is  considerably  wider  in  Monte 
Carlo  problems. 

One  approach  available  to  reduce  the  variance  of  the  Monte  Carlo 
estimator  is  to  find  a function  £ (F,X^, . . . ,X^)  with  the  same  expectation 
as  g,  and  with  smaller  variance.  Then  £ rather  than  g is  averaged 
over  the  M samples.  Of  course,  £ = E^g  fits  this  description  but  were 
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it  directly  computable  one  would  not  need  to  simulate  in  the  first  place. 

Thus  an  important  requirement  of  i,  is  that  it  be  simply  computable. 

We  illustrate  the  above  remarks  by  considering  the  problem  of  Monte 
Carlo  estimation  of  M(t)  = EN(t),  the  expected  number  of  renewals  in 
[0,t]  for  a renewal  process  with  known  interarrival  time  distribution  F. 
Several  unbiased  estimators  which  compete  favorably  with  the  naive  estimator, 
N(t),  are  presented  and  studied. 

We  believe  that  our  approach  and  methodology,  although  only  applied  to 
renewal  function  estimation  in  this  paper,  can  be  useful  in  a large 
variety  of  Monte  Carlo  simulation  problems. 


2.  Assume  that  < Xp,  i > 1 > is  i.i.d.  with  cdf  F where  F(0)  = 0. 

Define  S = 0,  S = Z X. , n = 1,2, . . . ,N(t ) = maxfn:  S < t},  and 
o n p l n — 

M(t)  = EW(t),  t > 0.  Sometimes  we  consider  the  point  t = 0 as  a renewal 
epoch.  In  this  case  we  use  NQ(t)  = N(t)+1  and  M^t)  = M(t)+1.  The 
renewal  age  at  time  t is  defined  by  A(t)  = t-S^^;  Pr(A(t)  = t)  = F(t) 
and  dF^^  = F(x)dM(t-x)  for  0 < x < t,  thus  dF^^  = F(x)dl^-)(t-x) 
for  0 < x < t. 


Define 


6.  = 

l 


1 if  S.  < t 

l — 


0 if  S.  > t . 

l 


00  CO  00  ft  \ 

Then  N(t)  = Z 6.  and  M(t)=EZ  6.  =ZFl  ;(t),  where  Fv 1 is  the  i™ 

1 1 1 1 1 

convolution  of  F. 
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/ • \ 

To  estimate  F 1 (t)  = E&^  we  will  use 

E(&i|X1,...,Xi_i)  = E(8i|Si_1)  = . 

We  then  estimate  M(t)  by: 

00  N(t)+1 

(1)  M^(t)  = E p(t-Si_i)  = 1 Fft-S^)  . 

i=l  i=l 

Since  Var(F(t-S^  ))  = Var[E(S^|S^  )]  < Var  8^,  we  have  replaced 
each  component,  8^,  by  a component  with  the  same  expectation  and 
smaller  variance.  Intuitively  we  would  expect  that  if  we  reduce  the 
variability  at  each  stage  (given  the  past)  then  we  should  reduce  the 
variability  of  the  overall  estimator.  However,  the  computation  of 
variance  involves  covariance  terms,  and  if  these  are  increased  while 
variances  are  decreased  there  can  conceivably  be  an  increase  in  variance. 
Theorem  1 (below)  demonstrates  that  Mp(t)  does  indeed  have  lower  variance 
than  M(t). 

Theorem  1.  Mp(t)  is  an  unbiased  estimator  of  M(t)  and  Var  N(t)  - Var  Mg,(t) 
E[2M(A(t))  -F(A(t))]  > 0,  with  strict  equality  if  F(t)  > 0. 

Before  proving  theorem  1 we  comment  that  the  reduction  in  variance 
is  unsatisfactorily  small  for  large  t.  If  = EX2  < oo  then 
E[2M(A(t) ) - F(A(t) )]  =0(1)?  thus  Var  N(t)  and  Var  M^(t)  are  of  the 
form  7t  +0(l)  with  common  7,  and  we  improve  only  the  asymptotically 
negligible  0(l)  term.  Estimators  considered  in  later  sections  do 
considerably  better  for  large  t. 
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Proof  of  Theorem  1.  Express  M^(t)  as 


F(t)  + f F(t-x)cLN(x)  = f F(t-x)dN„(x) 
■Jn  jq  u 


Then 


EMpCt)  = J F(t-x)dM0(x)  = J ldMQ (x)  - J F(t-x)diy^ (x) 
= t^(t)  - J dF^}  = l^(t)-l  = M(t)  . 


Now, 


EM^(t)  = J F2(t-x)dMQ(x) 

// 


+ 2 JJ  F(t-r)F(t-s  )d^)(r)d^)  (s-r) 
r < s 


We  evaluate  this  expression  in  several  steps: 

(i)  J F2(t-x)dN^(x)  = J F(t-x)d^j(x)  - j F(t-x)F(t-x)dN^(x) 


= M(t)  - EF(A(t))  . 


(ii)  F(t-r)F(t-s)  = l-F(t-r)  - F(t-s)  + F(t-r)F(t-s)  . 


(ill)  2 


(iv)  -2 


JJ  ldI^(r)dM0(s-r)  = 2 J M(t-r)dMQ  (r)  = 2M(t ) + 2M*'2'1  (t) 
JJ  F(t-r )dM0(r)dM0(s-r)  = -2  f F(t-r)M(t-r)dl^(r) 


r < s 


= -2EM(A(t))  . 


(v)  -2 


JJ  F(t-s  )dM0(r)dI^)(s-r)  = -2  f F(t-r)dM0(r) 

<<  q J r»— n 


r < s 


r=0 
= -2M(t)  . 


(vi)  2 JJ  F(t-r)F(t-s)dJy^(r)dM0(s-r)  = 2 f F(t-r)F(t-r)dM  (r) 

r < s Jr=0 


= 2EF(A(t))  . 


Combining  (i)-(vi)  we  obtain: 


(2)  EN^(t)  = M(t)  + 2M^(t)  - E(2M(A(t))  - F (A(t ) ) ) 


Furthermore 


(3)  EJfP(t)  = E[  f ldN(t)]2=  M(t)  + 2 JJ  dM(r )dM(s-r ) 

Jo  r < s 


= M(t)  + 2M^')(t)  . 


Thus  from  (2 ) and  (3 ) : 


Var  N(t)  - Var  Mp(t)  = E[2M(A(t))  - F(A(t))] 


Since 


M(s) 


£ F^(s),  2M(s)  -F(s)  = F(s)  + 2 £ F^(s)  > 0 ; 

i=l  i=2 
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thus  E[2M(A(t ))  - F(A(t ))  ] >0  for  all  t and  is  strictly  positive 
for  F(t)  > 0. 


3.  In  this  section  we  assume  that  F is  continuous.  The  cumulative 

hazard  H is  defined  by  H(t)  = -log  F(t).  'When  F is  absolutely 

continuous  with  density  f then  H(t)  = h(y)dy  where  h is  the 

f(t) 


hazard  function,  h(t)  = 


F(t) 


Our  next  estimator  is  based  on  the  intuitive  idea  that 

ft 
>0 


E(dN(s  )|past ) = dH(A(s)).  Thus  instead  of  using  N(t)  = /^dWCs)  we 
try 


ft  N(t) 

^(t)  = dH(A(s  ) ) = £n(X.)  + H (A (t ) ) 


N(t)+1 


H. 

1 


where  IL  = H[  (t-S.^  1)  A X.^]  (where  a A b = min(a,b)). 

°°  00 

Note  that  N(t)  = Z 5.  while  M^(t)  = 2 IL.  Thus  is  replaced 

by  H.,  and  E(5i|Si_1)  = E(Hi|Si_1)  = FCt-S.^)  . 

The  process  M^.(t)  is  a cumulative  process  in  the  sense  of  Smith 
[3].  Thus  (Smith  [3]) 

Var  ^(t)  ~ ^ E[H(X)  - (^p^-)X]2  , 


where  p = EX.  But  H(x)  = -log  F(X)  is  exponentially  distributed  with 
parameter  1,  thus: 


E[H(X)  - X]2  = 1 + % - ^ , 

d ^ 
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where  p is  the  correlation  coefficient  between  X and.  H(x)  and  cr 
is  the  variance  of  X.  Thus  M^(t)  is  asymptotically  better  than  N(t) 
for  p > \x/2<j,  asymptotically  worse  than  N(t)  for  p < p/2cr. 

In  general  if  we  have  two  unbiased  estimators  of  a parameter,  T 
and  with  covariance  matrix  A,  then  the  minimum  variance  unbiased 

estimator  of  the  form  ctT^  + (l-CC)^  is  the  one  with 


a = 


3=1  3 

2 2 

E 

i=l  3=1  J 


The  variance  of  this  estimator  is 


The  idea  now  is  to  let  A be  the  asymptotic  covariance  matrix  of 


(N(t2 

A 


Vt). 

A 


and  to  employ  the  above  result  to  obtain  an  unbiased  estimator  which 
improves  on  both  M^t)  N(t)  for  large  t.  We  already  know  the 

0(t)  terms  for  Var  N(t)  and  Var  M^('t).  We  only  need  the  leading 
term  for  Cov(W(t),  M^(t)).  This  is  given  in  lemma  1 below. 

2 

Lemma  1.  If  cr  is  finite  then 


Cov(N(t),  M^t))  = J + °(t) 
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Proof. 


Var (N(t)  - Mpj(t))  = Var  £ (8i-H(t-S±_1  AX.)) 


= £ E Var[51-  HCt-S.^  A X±)|S±_1]  = E £ Fft-S^)  = EN(t)  = M(t) 


Thus 

M(t)  = Var(N(t)  - Mg(t))  = Var  N(t)  + Var  (Mg  (t) ) - 2Cov(N(t  ),Mg(t) ) , 

and  therefore 

Cov(N(t),Mg(t))  = | [Var  N(t)+Var  Mg(t)-M(t)] 

-|r[4  + 4 + 1 -v  ■1  + o(1)] 

M-  M-  ^ 

» £ <4  - f£)+o(t).  || 

^ (i 

Now 


A = y 

VM 


2 

Q_  op 
2 " n 


2 v 

0_  _P  \ 

.,2  " H \ 


2_  + i _ 2£P 
N2  “ 


) 


and 


a = 


lA 


2 2 

E Ea; 

i=l  3=1 


= 1 - 


crp 


-1 

id 
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1 


A?* 

ij 


cr  H 2 s 
= (1-P  ) 


Note  that  the  asymptotic  relative  savings  in  variance  is  p the 
square  of  the  correlation  coefficient  between  X and  H(X).  Summarizing: 
Theorem  2.  The  estimator 

M*(t)  = (l-ffi)N(t)  +2^(10 

is  an  -unbiased  for  M(t)  with  variance 

*4  (i-p2) + oft) 
n 

(p  is  the  correlation  coefficient  between  X and  H(X)).  It  follows 
that: 


Var  N(t)  - Var  M (t)  2 . N 

Var  W(t) ^ * c + o(l)  ' 


2 


Example:  Let  H(x)  = x£,  F(x)  = e~X  . Then, 


-L 


CO  2 /—  /.  00  2 /— 

, -x  , J*.  f 1 -x  , Jit 

>0  2J- oo  2 


EX2  = 


-f 

Jo 


00  2 

xe~X  dx  = 1 , 


thus 


2 rt  4-n  1 r f 4 -x2 , u_  1 / Jt  2 

a ■ 1 " V ' TT  ; p ’ ? !/2x  e = 2 VIST’0  * 


4(4-n) 


.915 
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Thus  in  this  case  (Weibull  with  shape  parameter  2)  the  unbiased  estimator 
M (t)  has  an  asymptotic  relative  reduction  in  risk  over  W(t)  of 
91.5  percent.  || 

Integration  by  parts  shows  that 

1 f 00 

P = - / H(x)F(x  )dx  ; 

aJ  0 

since  H(x)  = -log  F(x)  the  integral  can  probably  be  given  an  enthropy 
interpretation.  Also  p = — EH(X)  where  H(x)  = f*  H(z)dz.  This  is 
true  since 

/■  00  _ f 00  r 00  f X 

J H (x )F (x )dx  = J H(x)EIJC>x  dx  = E J H(x)lx>xdx  = E J H(x)dx  = EH(X). 

per 

Note  that  both  p and  — are  invariant  under  a change  of  time  scale, 
t -*■  ct,  c > 0. 


00 


4.  In  section  3 we  estimated  M(t)  by  a weighted  average  of  W(t)  = E 8. 

N (t  )+l  1 1 

and  Mp(t)  = Z H((t-S^_-^)  A X^).  Now  we  apply  the  same  idea  but 

stagewise.  At  stage  i,  having  observed  X^,...,X^  N(t)  adds  the 

component  8^  = 1^  g > while  M^(t)  adds  IL  = H((t-S^_-^)  A X^). 

Each  of  8^,  IL  are  conditionally  (given  unbiased  for  F(t-S^  1) 

and  unconditionally  unbiased  for  F^(t).  The  approach  we  now  follow  is 
to  use  the  weighted  average  of  8^  and  IL  which  has  smallest  conditional 
variance  given  X^,...,X^ 
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Define  F^  = F(t-S  ^),  CL  = H(t-S  ^).  Then: 
TarfeiISi_1)  = P.-F^ 


Cov(5.,H.  |S.  _ ) = F.  (F.  -C.  ) 
v i'  x ' x-1  11  1 7 


Var(Hi|Si_1) 


F. +F. (F.-2C. ) . 
1 11  1 


The  mininrum  conditional  variance  (given  X^,...,X^  ^ ) unbiased  linear 
combination  is  then: 


L,  = (1  - 


C.F. 


F. 

l 


C.F. 

l l 

F. 

l 


H. 

l 


The  corresponding  estimator  of  M(t)  is: 

N(t)+1  H(t-S  )F (t-S.  , ) 

«Llt>  = H<t)  - Z ~ PTE--)--  - <W  ' 

1 V 1 -1 

We  do  not  know  now  M^(t)  compares  with  the  other  estimators  we 

have  looked  at.  The  variance  of  an  estimator  of  the  form  X is 

X Var  K.  + 2 X Cov(K.,K.);  L.  was  chosen  from  among  a class  of 
1 i < j 1 0 1 

estimators  X to  minimize  X Var  However  we  know  very  little 

about  Cov(L. ,L.).  This  latter  quantity  must  be  shown  to  be  suitably 
small  in  order  to  demonstrate  that  M^(t)  has  desirable  variance 
properties. 
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5.  We  next  consider  an  unbiased  estimator  with  asymptotic  variance 
0(1 )•  Thus  it  asymptotically  enjoys  a 100  percent  reduction  in 
variance  over  N(t). 

As  is  well  known  N(t)+1  is  a stopping  time  and  thus  by  Wald's 
identity: 

N (t  )+l 

ESK(t)+l-E  r x.  = u(M(t)+l)  . 

Thus 

M(t, , . ! 

M- 

is  -unbiased  for  M(t).  Wow  Var(SN^+1)  = Var(t+Z(t))  = Var  Z(t), 
where  Z(t)  is  the  forward  recurrence  time  at  t.  If  = EX^  < oo 
then  Var  Z(t)  converges  to 

2 2 

12n2 

as  t ■*  oo  . Thus 

A.  W*  -3dp 
Var  M(t)  ■> , 

12p 

and  is  thus  0(1  )• 
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